Goto

Collaborating Authors

 fast large-scale classification


GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification: Supplementary Material

Neural Information Processing Systems

Speedups were tested for both batch gradient descent (with a 0.001 learning rate) and L-BFGS . Let 1 denote the indicator function. TRON is detailed in Algorithm 1. The other direction is slightly different.


GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Neural Information Processing Systems

One of the most efficient methods to solve L2 -regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets–which impose far more stringent memory constraints–we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups.


Review for NeurIPS paper: GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Neural Information Processing Systems

Summary and Contributions: Trust Region Newton Algorithm (TRON) is the most efficient solver for L2 regularized primal problems e.g. Due to the complex and sequential nature of this algo., its past performance boosts have largely been driven by shared memory multi-core systems. This paper demonstrates significant speedups in the training time of TRON solver compared to multithreaded implementations by using GPU specific optimization principles. The authors apply specific optimizations on sparse representation (LR training) and dense representation problems (SVM training) to generate significant speedups in their training time using GPUs. Specifically, for sparse feature representation datasets and LR loss function, the authors prescribe optimizations that minimize sequential dependence of CPU/GPU execution on each other by assuming all conditional branches evaluate in favor of the high-compute operations that can be run pre-emptively on the GPU.


GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Neural Information Processing Systems

One of the most efficient methods to solve L2 -regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets–which impose far more stringent memory constraints–we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups.